Research Question
How left or right are EU directives and regulations introduced between 1989 and 2024?

Data Collection

The directives and regulations used in this research come from two sources. The first is the CEPS EurLex dataset, which contains around 70k directives and regulations from 1989 to 2019. The second source is the EUR-Lex website, an official EU website providing access to EU law. EUR-Lex can be scraped using the eurlex R package, which has been done to retrieve post-2019 legislation up to August 2024 (the time of writing of this report).

EUR-Lex also provides a summary for some directives and regulations (see this example). Summaries are provided on a case-by-case basis, as described here.

Overview Directives and Regulations

Count of Directives and Regulations

## 
##  Directive Regulation 
##       3034      72536

Oldest Legislation

## [1] "1989-01-02"

Newest Legislation

## [1] "2024-08-14"

Directives and Regulations by Year

Overview Summaries

Number of Summaries

## [1] 1637

Summary Count of Regulations and Directives

## 
##  Directive Regulation 
##        625       1012

Oldest Summary

## [1] "1989-02-13"

Newest Summary

## [1] "2024-02-28"

Summaries of Directives and Regulations by Year

Recreating Hix Høyland (2024)

Distribution of Ideological Values

All Directives and Regulations (N = 74,734)
Please note the log-scaled y-axis
Negative values: More left
Positive values: More right
Dotted red line highlights ideological middle (value of 0)

All Summaries (N = 1637)
Please note the log-scaled y-axis
Negative values: More left
Positive values: More right
Dotted red line highlights ideological middle (value of 0)

Comparing Preamble with Summaries

Scatter Plot with Line of Equality
Red dashed line of equality shows where the values from the preamble and the summary would be equal. Points below the line indicate that preamble value is greater than the summary value (i.e., more right), and points above indicate the opposite. Plot applies jitter in order to counteract overlapping points (e.g., at 0,0).

Mean-Difference Plot (Bland-Altman Plot)
Red dashed line of equality shows where the values from the preamble and the summary would be equal. Points below the line indicate that preamble value is lower than the summary value (i.e., more left), and points above indicate the opposite. Plot applies jitter in order to counteract overlapping points.

In general, we can observe that summaries tend to be scored more to the right. We can also observe that summaries often receive a score of 0, while the (longer) preamble contains more data and thus receives a more differentiated score.

Comparing Additional Preprocessing Steps

Do the results differ when more preprocessing steps are applied? Rheault and Cochrane 2020 perform subsampling, remove digits and words with two letters or less, remove English stop words and overly common words that appear in their corpus.

I do the same on a random subset of 5000 legislations. I use a subset to keep the computation time low. I then compare these with the results without the additional preprocessing steps.

Eyeballing the results, we can see that legislations with additional preprocessing tend to drift towards the right. We can also see clusters, e.g. in the RoBERT_rile plot. There we can see that legislation that previously scored between -4 and +2 gets a score of 0 after preprocessing. This makes sense because the preprocessing steps remove data, making it more difficult for the model to generate a score.

Latent Semantic Scaling (LSS)

LSS is a method for measuring the semantic similarity between documents and a set of seed words. In our case, each directive or regulation is a document and the seed words convey “typical” left-right terms. The semantic similarity between documents and seed words is calculated using word embeddings provided by GloVe. To keep computation efficient, I use the smallest pre-trained word vectors model available (6B tokens, 400K vocab, 50 dimensions).

Seed Word: Manual Seed Words

In a first approach, I manually come up my own seed words:

Economic left seed words

##  [1] "wealth redistribution"          "state ownership"               
##  [3] "public sector jobs"             "universal basic income"        
##  [5] "progressive income tax"         "welfare programs"              
##  [7] "labor unions"                   "government subsidies"          
##  [9] "public healthcare"              "social housing"                
## [11] "minimum wage increase"          "unemployment benefits"         
## [13] "state intervention"             "public education funding"      
## [15] "affordable housing initiatives"

Economic right seed words

##  [1] "free market capitalism"       "privatization"               
##  [3] "corporate tax cuts"           "deregulation"                
##  [5] "fiscal austerity"             "trade liberalization"        
##  [7] "supply side economics"        "property rights"             
##  [9] "entrepreneurial incentives"   "limited government"          
## [11] "investment freedom"           "business deregulation"       
## [13] "lower tax rates"              "market driven wages"         
## [15] "reduction in public spending"

Social left seed words

##  [1] "lgbtq rights"             "gender equality"         
##  [3] "reproductive rights"      "anti-discrimination laws"
##  [5] "affirmative action"       "marriage equality"       
##  [7] "racial justice"           "environmental justice"   
##  [9] "prison reform"            "immigrant integration"   
## [11] "workers rights"           "secularism"              
## [13] "universal human rights"   "income equality"         
## [15] "cultural diversity"       "refugee rights"          
## [17] "climate justice"          "social inclusion"        
## [19] "anti-austerity"           "freedom of movement"     
## [21] "multiculturalism"

Social right seed words

##  [1] "traditional family values"    "pro-life"                    
##  [3] "national sovereignty"         "law and order"               
##  [5] "patriotism"                   "immigration control"         
##  [7] "anti multiculturalism"        "individualism"               
##  [9] "christian heritage"           "border security"             
## [11] "conservatism"                 "cultural identity protection"
## [13] "anti lgbtq adoption"          "cultural homogeneity"        
## [15] "anti secularism"              "tightening asylum policies"  
## [17] "pro national identity"

Seed Word: Systematic Approach

In a second approach, I apply two different models to extract seed words from text. I apply Wordscores to extract seed words from party manifestos. I also use Wordfish to extract seed words from existing legislations.

Wordscores

I apply the Wordscores model using reference scores from the Manifesto Project. Specifically, I use rile […]. The following tables display the top and bottom of the tables displaysing extracted Wordscore scores for terms extracted from the manifestos. Negative values are associated with left-wing terms, positive values with right-wing terms

RILE: Right-left position of party as given in Michael Laver/Ian Budge
Most negative values (left-wing terms)

##     token
##    gender
##  equality
##  proposes
##        token
##    die_linke
##    sinn_féin
##  labor_party
##                      token
##      children_young_people
##   greenhouse_gas_emissions
##  sexual_orientation_gender
##                                   token
##      sexual_orientation_gender_identity
##  convention_rights_persons_disabilities
##         reduce_greenhouse_gas_emissions

More results: Unigrams | Bigrams | Trigrams | 4-grams

Most positive values (right-wing terms)

##  token
##    vvd
##    sgp
##    svp
##            token
##  christian_union
##   progress_party
##    vlaams_belang
##                    token
##    free_movement_persons
##  vlaams_belang_advocates
##    danish_people's_party
##                                   token
##                new_nuclear_power_plants
##  proliferation_weapons_mass_destruction
##     small_medium-sized_enterprises_smes

More results: Unigrams | Bigrams | Trigrams | 4-grams

Planeco: per403 + per404 + per412
Most negative values (left-wing terms)

##    token
##   sweden
##  swedish
##      sgp
##             token
##    progress_party
##    canary_islands
##  sweden_democrats
##                    token
##  social_democratic_party
##    danish_people's_party
##   upper_secondary_school
##                                token
##       government_pension_fund_global
##             earned_income_tax_credit
##  autonomous_community_canary_islands

More results: Unigrams | Bigrams | Trigrams | 4-grams

Most positive values (right-wing terms)

##     token
##  proposes
##     banks
##     ecolo
##           token
##  social_housing
##       die_linke
##    cdh_proposes
##                          token
##           nuclear_power_plants
##         social_security_system
##  social_security_contributions
##                                   token
##        european_convention_human_rights
##  convention_rights_persons_disabilities
##      sexual_orientation_gender_identity

More results: Unigrams | Bigrams | Trigrams | 4-grams

Markeco: per401 + per414
Most negative values (left-wing terms)

##    token
##   canary
##  islands
##   greens
##           token
##  canary_islands
##     labor_party
##     green_party
##                     token
##  greenhouse_gas_emissions
##     children_young_people
##            green_new_deal
##                                   token
##      sexual_orientation_gender_identity
##  convention_rights_persons_disabilities
##                    equal_pay_equal_work

More results: Unigrams | Bigrams | Trigrams | 4-grams

Most positive values (right-wing terms)

##  token
##    fdp
##    vvd
##    svp
##               token
##  conservative_party
##      free_democrats
##      progress_party
##                  token
##  free_movement_persons
##  social_market_economy
##    progress_party_work
##                                   token
##     small_medium-sized_enterprises_smes
##  proliferation_weapons_mass_destruction
##               corporate_income_tax_rate

More results: Unigrams | Bigrams | Trigrams | 4-grams

Welfare: per503 + per504
Most negative values (left-wing terms)

##  token
##    svp
##    vvd
##    fdp
##            token
##    vlaams_belang
##    party_animals
##  christian_union
##                           token
##           social_market_economy
##  small_medium-sized_enterprises
##           free_movement_persons
##                                 token
##                  akel_left_new_forces
##  take_advantage_opportunities_offered
##             corporate_income_tax_rate

More results: Unigrams | Bigrams | Trigrams | 4-grams

Most positive values (right-wing terms)

##  token
##   sinn
##   féin
##   denk
##         token
##     die_linke
##  cdh_proposes
##     sinn_féin
##                   token
##  mental_health_services
##   children_young_people
##    sinn_féin_priorities
##                                   token
##                    equal_pay_equal_work
##  convention_rights_persons_disabilities
##      sexual_orientation_gender_identity

More results: Unigrams | Bigrams | Trigrams | 4-grams

Wordfish

Wordfish is an unsupervised method, meaning that it estimates the positions of documents solely based on the observed word frequencies. Due to computational restraints, I could only process a random sample of 5k and 10k legislations while running the Wordfish model. I believe that including more or even all 70k legislations would not render much different results or would not justify the longer compute time.

The tables below display features (i.e., tokens) and their respective beta (i.e., the estimated effect of the token on the latent dimension). A positive beta value indicates that the word is more associated with the positive side of the dimension, while a negative beta value indicates association with the negative side. As the tables display below, the results are difficult to interpret. Tokens on both sides of the latent dimension (i.e., highest and lowest beta values) cannot be assigned to a political dimension.

Wordfish Scores: Random Subsample of 5000 Legislations (Unigrams, highest and lowest beta values)

##       feature     beta
## 1        efsa 43.75096
## 2        incl 41.48821
## 3         imo 41.14718
## 4        circ 39.79716
## 5       itu-r 37.67625
## 6      consol 35.14699
## 7    butyrate 32.35146
## 8  methylthio 29.25866
## 9     formate 29.04137
## 10        coe 28.60147
##        feature      beta
## 1    kolejowej -1.441089
## 2    ortslagen -1.434837
## 3        stary -1.431810
## 4         groß -1.428269
## 5   południowy -1.425471
## 6        drogi -1.422172
## 7       sofern -1.420133
## 8        drogę -1.418893
## 9    następnie -1.417684
## 10 wyznaczonej -1.416342

Wordfish Scores: Random Subsample of 5000 Legislations (Bigrams, highest and lowest beta values)

##                   feature     beta
## 1       contained_housing 51.48734
## 2               free_free 46.04209
## 3   monolithic_integrated 45.14960
## 4         form_monolithic 44.09294
## 5  letters_identification 30.71391
## 6                 ecu_net 22.21416
## 7                ecus_ecu 22.12461
## 8                 ecu_cus 22.00006
## 9             code_codice 21.49203
## 10             using_tail 19.83098
##                      feature          beta
## 1            legally_binding -0.4740856905
## 2   international_agreements -0.4612420543
## 3              national_food -0.0140582032
## 4  international_obligations -0.0009404775
## 5                rice_sector  0.0008225689
## 6         landing_obligation  0.0016989285
## 7          non-personal_data  0.0017392246
## 8               data_holders  0.0017393568
## 9              data_altruism  0.0017417777
## 10           recognised_data  0.0018706427

Wordfish Scores: Random Subsample of 10,000 Legislations (Bigrams, highest and lowest beta values)

##               feature       beta
## 1        width_height 0.25166199
## 2     currently_force 0.21852978
## 3           free_text 0.10477695
## 4     special_edition 0.02915246
## 5  responsible_agency 0.01467287
## 6    list_responsible 0.01467231
## 7       qualifier_n.a 0.01467154
## 8          agency_n.a 0.01467150
## 9          number_n.a 0.01467080
## 10      related_place 0.01467034
##                     feature       beta
## 1  non-defaulted_applicable -210.20974
## 2  deliveries_non-defaulted -184.90778
## 3     protection_applicable -158.09237
## 4         exposures_without -157.40835
## 5      applicable_mortgages -130.05261
## 6                   mln_eur -121.98412
## 7            corporates_sme -121.98412
## 8               część_gminy -121.43343
## 9         corporates_credit  -91.31062
## 10        without_privilege  -84.66717
Wordscores & Wordfish Evaluation

Both methods do not return satisfying results. Some terms returned by Wordscores faintly resemble the manual seed words. On the other hand, almost all terms returned by Wordfish cannot be assigned to a clear political side or topic. The Wordscore and Wordfish seed words will not be applied in the LSS method for the time being.

LSS Evaluation

How well can LSS measure the left-right polarity of EU policies? To answer this question, I analyse the keywords that are attached to each document. The EU gives each policy a set of keywords that describe the policy’s content. If LSS works correctly, then the keywords should be associated with economic left/right terms. There are two sets of keywords, as described in the dataset’s codebook:

  1. EUROVOC: A group of EuroVoc keywords associated with the act. See here for details
  2. Subject Matter: Group of keywords representing the subject matter of the act. Similar to EUROVOC, only less detailed, more abstract.

LSS calculates a polarity score for each document. Values range between ca. -2.5 and +2. A negative score is associated with right terms, a positive score with left terms. I create three bins of equal width, containing an unequal number of observations and label them “left”, “centre” and “right”.

Number of documents per bin
Economy

centre   left  right 
 68019   6277    438 

Social

centre   left  right 
 26580  47133   1021 

Most frequent keywords in economic “left” bin

  EUROVOC_Keyword     Occurences
1 PDO                        502
2 product description        467
3 blockade                   452
4 ban on sales               433
5 economic sanctions         425

  Subject_matter_Keyword Occurences
1 marketing                    1515
2 agricultural policy          1029
3 health                       1021
4 agricultural activity         804
5 consumption                   790

Most frequent keywords in social “left” bin

  EUROVOC_Keyword          Occurences
1 Community aid to exports       5934
2 entry price                    5263
3 aubergine                      5243
4 citron                         5190
5 apple                          4686

    Subject_matter_Keyword Occurences
1 plant product               13369
2 trade policy                10386
3 prices                       7992
4 trade                        7936
5 tariff policy                7351

Most frequent keywords in economic “centre” bin

  EUROVOC_Keyword            Occurences
1 Community aid to exports         7570
2 automatic public tendering       5558
3 entry price                      5514
4 aubergine                        5463
5 citron                           5371

  Subject_matter_Keyword Occurences
1 plant product               17042
2 trade policy                14421
3 trade                       10926
4 tariff policy               10286
5 prices                       9357

Most frequent keywords in social “centre” bin

  EUROVOC_Keyword            Occurences
1 Community aid to exports         1649
2 automatic public tendering       1472
3 import                           1453
4 sea fish                         1434
5 catch plan                       1313

  Subject_matter_Keyword Occurences
1 trade policy                 4444
2 plant product                3961
3 Europe                       3905
4 trade                        3204
5 tariff policy                3088

Most frequent keywords in economic “right” bin

  EUROVOC_Keyword            Occurences
1 beef                               80
2 automatic public tendering         59
3 floor price                        48
4 EC country                         44
5 CCT duties                         41

  Subject_matter_Keyword Occurences
1 trade policy                  145
2 animal product                 84
3 prices                         80
4 Europe                         75
5 trade                          67

Most frequent keywords in social “right” bin

  EUROVOC_Keyword         Occurences
1 agri-foodstuffs product         99
2 food product safety             97
3 fungicide                       85
4 blockade                        71
5 agricultural product            69

  Subject_matter_Keyword Occurences
1 marketing                     165
2 tariff policy                 158
3 agricultural activity         146
4 trade policy                  125
5 foodstuff                     123

As a comparison, here are the most frequent keywords overall:

  EUROVOC_Keyword            Occurences
1 Community aid to exports         7590
2 automatic public tendering       5633
3 entry price                      5518
4 aubergine                        5481
5 citron                           5387

  Subject_matter_Keyword Occurences
1 plant product               17447
2 trade policy                14955
3 trade                       11202
4 tariff policy               10597
5 prices                       9460
Keyword Results

A first analysis of the results shows that the keywords partially align with typical economic left/right terms. Keywords like “equal opportunity”, “State pension”, “employment”, “labour market” and “social protection” fit well with an economic left ideology, while “trade policy”, “tariff policy” and “international trade” can be more strongly associated with an economic right ideology.

However, there are keywords that appear to have no ideological connotation but still appear in the “most frequent” list, e.g., “exchange of information”, “beef” or “Europe”. This indicates that the polarity score calculated by LSS has its flaws.

Eyeballing the results, I feel that the Subject Matter keywords are more suitable for the evaluation than the EUROVOC keywords. This may be due to the Subject Matter keywords being more general than EUROVOC and thus capturing broader meanings. The results show that the Subject Matter keywords align better with my expectations, likely because their general nature allows them to capture a wider range of topics and themes within the dataset, while the EUROVOC keywords are too specific to do this.

LSS Plots

Overview LSS Scores

LSS Economy vs LSS Social Scores

LSS vs Hix Høyland (normalised and reversed scores)
LSS scores are reversed in these plots: Negative values are ideologically left, positive values are ideologically right.

ChatGPT: 0-Shot

Overview

ChatGPT vs Hix Høyland